Unsupervised Word Segmentation with Bi-directional Neural Language Model
نویسندگان
چکیده
We propose an unsupervised word segmentation model, in which for each unlabelled sentence sample, the learning objective is to maximize generation probability of given its all possible segmentations. Such a can be factorized into likelihood segment context recursive way. To capture both long- and short-term dependencies, we use bi-directional neural language model better extract features segment’s context. Two decoding algorithms were also developed combine from directions generate final at inference time, helps reconcile word-boundary ambiguities. Experimental results show that our context-sensitive achieved state-of-the-art different evaluation settings on various datasets Chinese, comparable result Thai.
منابع مشابه
Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation
Recurrent neural network(RNN) has been broadly applied to natural language processing(NLP) problems. This kind of neural network is designed for modeling sequential data and has been testified to be quite efficient in sequential tagging tasks. In this paper, we propose to use bi-directional RNN with long short-term memory(LSTM) units for Chinese word segmentation, which is a crucial preprocess ...
متن کاملBayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our model is a nested hierarchical Pitman-Yor language model, where Pitman-Yor spelling model is embedded in the word model. We confirmed that it significantly outperforms previous reported results in both phonetic transc...
متن کاملBayesian Unsupervised Word Segmentation with Hierarchical Language Modeling
This paper proposes a novel unsupervised morphological analyzer of arbitrary language that does not need any supervised segmentation nor dictionary. Assuming a string as the output from a nonparametric Bayesian hierarchical n-gram language model of words and characters, “words” are iteratively estimated during inference by a combination of MCMC and an efficient dynamic programming. This model c...
متن کاملFeature-based Neural Language Model and Chinese Word Segmentation
In this paper we introduce a feature-based neural language model, which is trained to estimate the probability of an element given its previous context features. In this way our feature-based language model can learn representation for more sophisticated features. We introduced the deep neural architecture into the Chinese Word Segmentation task. We got a significant improvement on segmenting p...
متن کاملUnigram Language Model for Chinese Word Segmentation
This paper describes a Chinese word segmentation system based on unigram language model for resolving segmentation ambiguities. The system is augmented with a set of pre-processors and post-processors to extract new words in
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2022
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3529387